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Doublelength Floating-Point 
Arithmetic on the TMS320C30 


Abstract 


This chapter, reprinted from /EEE Micro Magazine, describes the 
third generation of the TMS320 family of digital signal processors, 
the TMS320C30. It describes the origin and development of the 
32-bit floating-point device. The topics covered include: 


Q An overview of the characteristics of the TMS320C30 
processor 


Q Adescription of the architecture of the 320C30 


Q Adescription of the software features of the programmable 
DSP 


Q Adescription of development tools and support 


Q The types of demanding applications for which the 320C30 is 
most suitable 


Support graphics include: 
Q Architecture block diagrams 


QO Diagrams showing on-chip memory, cache and buses, the 
320C30 central processing unit, and peripheral bus and 
peripherals 


Q A pipeline of 320C30 instructions 
QO} Sample code implementations 


The chapter closes with an endnote about the likely direction of 
this technology, a list of references and some biographical 
information about the authors. 
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Product Support 


World Wide Web 


Our World Wide Web site at www.ti.com contains the most up to 
date product information, revisions, and additions. Users 
registering with TI&ME can build custom information pages and 
receive new product updates automatically via email. 


Email 
For technical issues or clarification on switching products, please 


send a detailed email to (dsoh@ti.com). Questions receive prompt 
attention and are usually answered within one business day. 


6 Doublelength Floating-Point Arithmetic on the TMS320C30 


In the past, extended-precision arithmetic has been implemented only on fixed-point 
processors. The introduction of the TMS320C30 Digital Signal Processor (DSP), a floating- 
point 33-MFLOP device, enables us to represent multilength floating-point math in terms 
of singlelength floating-point math. Extended-precision arithmetic allows designers to have 
more accuracy in their applications. Some of these applications include digital filtering, 
FFTs, image processing, control, etc. 


This application report describes how to extend the available precision of floating- 
point arithmetic on the TMS320C30. Our emphasis is on implementing an efficient exten- 
sion of the available precision while minimizing both the execution time and the memory 
usage. 


The structure of this report is as follows: The first section describes the TMS320C30 
DSP floating-point number representation. The second section discusses doublelength 
arithmetic and some basic definitions. The third section discusses the algorithms used along 
with the TMS320C30 implementation. An analysis of the error introduced by the algorithm 
is presented in the fourth section. The last section provides an insight into generating C- 
callable functions from assembly language routines. Finally, the appendix provides the 
source listings for the extended-precision arithmetic. 


Floating Point Format 
The TMS320C30 supports three floating-point formats [1]. 


e Short floating-point format, used to represent immediate operands, con- 
sisting of a 4-bit exponent and a 12-bit mantissa. 


° Single-precision format, used for regular floating-point value representa- 
tion, consisting of an 8-bit exponent and a 24-bit mantissa. 


e The extended-precision format, used with the extended-precision registers, 
consisting of an 8-bit exponent and a 32-bit mantissa. 


For the extended-precision algorithms to work properly on the DSP, it is important 
to start from the highest-precision floating-point format available in the system that is 
used for basic floating-point operations. The single-precision format is of particular in- 
terest in developing the TMS320C30 code for extended-precision floating-point opera- 
tions. Therefore, a working knowledge of the properties of this format is essential for 
the concepts presented in this application report. 


In the single-precision format, the floating-point number is represented by an 8-bit 
exponent field (e) in two’s complement notation, and a two’s complement 24-bit mantissa 
field (f) with an implied most-significant nonsign bit. Bit 23 of the mantissa indicates the 
sign (s), as shown in Figure 1. 


31 24 23 22 


[ef] | 


. Figure 1. Single-Precision Floating-Point Format of the TMS320C30 


Operations are performed with an implied binary point between bits 23 and 22. When 
the implied most-significant nonsign bit is made explicit, it is located to the immediate 
left of the binary point after the sign bit. We show the implied bit explicitly throughout 
this application report for clarity. The floating-point number x is expressed as follows: 


x= Olfx 2° if s=0; 
10fx 2° if s=1; 
0 if e = —128, s = 0, and f = 0 


The range and precision available with the TMS320C30 single-precision floating- 
point format are illustrated by the following values: 


= +3.4028234 x 10+38 
+5.8774717 x 10-39 
—5.8774724 x 10-39 
= —3.4028236 x 10+38 


Doublelength Floating-Point - The Basics 


Most Positive: 
Least Positive: 
Least Negative: 
Most Negative: 


x x *K x 
ll 


The techniques used to develop doublelength results in this application report re- 
quire a singlelength floating-point system and arithmetic that satisfy certain conditions. 
The TMS320C30 implementation takes the singlelength system as the highest floating- 
point precision system available. The algorithms presented do not require a doublelength 
accumulator with respect to the singlelength system used. The extended-precision formats 
available are used to control the truncation or rounding of the single-precision results. 


The doublelength arithmetic presented here increases precision of a given floating- 
point operation without the need for a doublelength accumulator. Using this method, the 
result of the floating-point operations on two single-precision numbers can be determined 
exactly. If x and y are two such numbers and the desired operation is addition, the result 
can be represented as a pair of floating-point numbers z and zz. The z value represents 


the most significant portion of the floating-point operation, while zz represents the least 
significant portion of the floating-point operation. 


As an example, consider the result of the exact addition of two floating-point numbers 
x and y that are expressed in the single-precision format of the TMS320C30: 


x = 217FFFFFh (decimal: 1.71798682 x 1010) 
y = OC7FFFFFh (decimal: 8.19199951 x 103) 


The values are represented in the TMS320C30 binary equivalent as follows: 


x = 233 x OL.111 1111 1111 1111 1111 1111b 
y = 212 x OL.111 1111 1111 1111 1111 1111b 


Addition of two floating-point numbers requires aligning the two variables x and y (1): 


x = 233 x OL.111 1111 1111 1111 1111 1111b 
y = 233 x 00.000 0000 0000 0000 0000 0111 1111 1111 1111 1111 1111 1000b 


As can be seen in this example, most of the precision available for y will not be 
available to carry out the addition. Maintaining full precision for floating-point addition 
requires extra mantissa bits beyond the 24 bits available on the DSP. Since the need for 
such precision is rare, software methods are used to represent the result of the operation 
as a floating-point number pair (z,zz). In our example, the exact result is represented as 
follows: 


z = 234 x 01.000 0000 0000 0000 0000 0011b 
zz = 20 x O1.111 1111 1111 1111 1111 1000b 


The corresponding hexadecimal representation of (z,zz) is shown below: 


z = 22000003h (decimal: 1.71798753 x 1010) 
zz = 097FFFF8h (decimal: 1.0239995 x 103) 


Some definitions are basic to the development of concepts in this report. First is 
the definition of the floating-point operations over a system R. The system contains all 
the possible floating-point numbers that the single-precision format of the TMS320C30 
can represent. All the floating-point arithmetic is carried out in base 2. Therefore, R can 
be represented as follows on the TMS320C30: 


R = {xlx = m(x)2e, |m(x)| <224, —128<e(x)< 127} 

A floating-point operation is faithful if the result of the operation fl(x *y) equals either: 
The largest element of R that is smaller than or equal to (x * y) or 

The smallest element of R that is larger than or equal to (x * y) 


where * represents one of the following floating-point operations: +, —, X, +. In other 
words, faithful refers to truncating the floating-point operation result. The floating-point 


multiplier on the TMS320C30 saves the upper 40 bits of the mantissa in one of the extended- 
precision registers [1] and drops the least significant byte of the result. By this definition, 
the floating-point multiplication on the TMS320C30 is faithful. Since the algorithms re- 
quire the floating-point result to be in single-precision format, the floating-point multiplica- 
tion on the DSP must therefore be followed by a second truncation step. Saving the contents 
of the extended-precision register to a memory location or masking off the low 8 bits results 
in truncation. 


A floating-point operation is optimal if for all x and y, the result of fl(x * y) is an element 
of R nearest to (x * y). In other words, the round-off error should not exceed one-half 
of the last remaining bit position. This is commonly referred to as rounding. 


The results of floating-point operations on the TMS320C30 are stored in the extended- 
precision registers [1]. The extended-precision register adds 8 bits of precision to the 
floating-point arithmetic result. Execution of the RND (round) instruction forces the result 
of the floating-point arithmetic to be optimal. When you round the result of the addition 
or subtraction operations on the TMS320C30, these floating-point operations become 
optimal. 


Implementing Doublelength Floating-Point Arithmetic 


This section presents the algorithms used in implementing doublelength arithmetic 
in pseudo-code for a number of fundamental floating-point operations. The basic idea of 
doublelength arithmetic can be extended to multiplelength precision, given that the start 
of the implementation is based on the highest precision available on the system. Therefore, 
to achieve quadruplelength results, the same algorithm can be applied to doublelength 
values, and so on. The implementation is based on the theoretical results presented in 
Reference [2]. 


Exact Singlelength Addition 


In this discussion of the algorithm used to carry out exact addition and its implemen- 
tation on the TMS320C30 DSP, the term exact refers to performing an operation on two 
floating-point numbers, x and y, and obtaining a doublelength floating-point number pair 
(z,ZZ) to represent the result. In this implementation, we have not accounted for floating- 
point exponent overflow or underflow. For this algorithm to produce a correct result, the 
floating-point addition and subtraction must be optimal. 


The purpose of exact addition is to find a term, zz, that satisfies Equation (2). 
z+mz=xty (2) 
Equation (2) can be rewritten as 


wz =y — (Z — x) (3) 


Equation (3) can be expanded into Equation (4). 


w=Zz-x (4) 
z=y-w 


In particular, |x| > |y| must be valid for Equation (4) to be valid. Implementation 
of Equation (4) on the TMS320C30 always generates the exact correction term zz if the 
result of floating-point addition operation is made optimal. This requirement guarantees 
that the result of single-precision floating-point add and subtract belongs to system R. By 
swapping the x and y values when |x| < |y|, the condition for obtaining an exact result 
is met. 


The algorithm requires that x and y be normalized. Normalization guarantees that 
the floating-point number has only one sign bit, and that sign bit is followed by nonsign 
bits [1]. Floating-point addition on the TMS320C30 assumes that the operands are nor- 
malized. 


The TMS320C30 assembly code for obtaining the doublelength sum of two 
singlelength floating-point numbers x and y is shown in Appendix A. First, the values 
for x and y are interchanged when |x| < |y|. When you add x and y values, the number 
with the smaller exponent, y, is shifted repeatedly until the exponents of x and y are equal 
and their mantissas are aligned. We have now calculated the singlelength number, z, that 
satisfies Equation (2). Since the floating-point addition on the TMS320C30 is made op- 
timal by rounding, the extra precision is, in effect, dropped. The extra precision value, 
zz, is obtained by implementing Equation (4). Figure 2 is a graphical representation of 
the implemented algorithm. The figure also shows the relationship between doublelength 
number pair (z,zz) and singlelength floating-point numbers and their representation on 
the TMS320C30. 


@-24— ; 
f2(normalized) 


Figure 2. Exact Singlelength Addition 


The same algorithm can be used to implement exact floating-point subtraction on 
the DSP. This is accomplished by negating the second operand and performing an exact 
addition. 

Doublelength Addition 


A natural extension of exact singlelength addition and subtraction is its application 
to doublelength arithmetic. Figure 3 shows an algorithm for implementing doublelength 
addition on the DSP. Using this algorithm, you can add two doublelength numbers (x,Xx) 
and (y,yy) and represent the result as a doublelength number (z,zz). 


The algorithm requires forming a doublelength number (r,rr) that represents an ex- 
act addition of x and y. Generating a second number, s = ((rr + yy) + xx), results in 
a number pair (r,s) that approximates the addition of (x,xx) and (yyy). Finally, an exact 
addition of r and s generates a doublelength number (z,zz) that has the same value as (X,xx) 
+ (yyy). 

To obtain exact results for addition and subtraction, subtraction and addition must 


be optimal; this is guaranteed by following each subtraction or addition instruction on 
the DSP with a round instruction. 


; Calculate the doublelength sum of (x,xx) and (y,yy), 
; the result being (z,zz) 
r=xt+y; 
if (abs(x) >abs(y)) 
s=x—rt+ytyy + xx; 
else 
=y—-r+xtxxt yy; 
Z=rts; 
wZ=1—-Zts; 


Figure 3. Doublelength Addition 
Exact Singlelength Multiplication 


The exact singlelength multiplication is shown in Figure 4. The algorithm requires 
breaking the x and y mantissas into half-length numbers, referred to as head (hx,hy) and 
tail (tx,ty) sections [2]. This algorithm requires addition and subtraction to be optimal 
and multiplication faithful. The TMS320C30 DSP multiplication result is faithful if the 
contents of the extended-precision register are truncated. 


To split x and y into two half-length numbers, a constant value is needed that is 
dependent on the number of available digits. The TMS320C30 device has t = 24 bits 
of mantissa in the single-precision format. Equation (5) shows that head section hx is chosen 
to be as near to the value of x as possible. 


hx = round(m(x)2-*!)2e@) +tl (5) 


Also, tl is chosen to be approximately one-half of the available precision, or 12, 
on the processor. This effectively breaks the mantissa into half-length values. Equation 
(5) shows that hx is obtained by rounding and is defined to be an element of R{tl}. The 
tail section tx is easily obtained by subtracting hx from x. Since floating-point subtraction 
can be made optimal on the TMS320C30, it follows that tx is an element of R{tl — 1}. 
Setting the constant equal to 2!2 does not always satisfy Equation (5) when t is even. When 
the constant is set to 212 + 1, the definition of Equation (5) is satisfied. The proof for 
the above is given in Reference [2]. 


; Calculate the exact product of x and y, the result being 

; a doublelength number (z,zz). This algorithm uses the 

; following syntax when called from a user program as shown 
; multl2 (x,y,z,zz); 


’ 


p = X X constant; 


tx = x — hx; 

Pp = y X constant; 

hy =y—ptp; 

ty = y — hy; 

p = hx x hy; 

q = hx X ty + tx X hy; 
z=pt+q 
w2Z=p-Z+qtt xX ty; 


Figure 4. Exact Singlelength Product 


Doublelength Multiplication 

The doublelength multiplication algorithm, shown in Figure 5, relies on the 
singlelength algorithm discussed earlier. The algorithm generates a nearly doublelength 
approximation of the output result (c,cc). Note that the exact singlelength multiplication 
routine is used for this approximation. Exact addition is used to generate a doublelength 
floating-point number that is the closest approximation to the actual result. 


The doublelength product program implementation uses the TMS320C30 stack 
capabilities to save some intermediate variables. These programs are written to be used 
as callable functions or macros in your program. In either case, the stack pointer must 
be set to a valid memory ‘segment for proper code execution. 


; Calculate the doublelength product of (x,xx) and (y,yy) 
; the result being a nearly doublelength number (z,zz). 
; Program uses exact singlelength multiplication, mult12 (.). 
mult12 (x, y, c, cc); 
cc = X X yy + xx X y + CC; 
Z=c+ cc; 
Z=c-—Z+ cc; 


Figure 5. Exact Doublelength Product 


Doublelength Quotient and Square Root 


Figures 6 and 7 show the algorithm used in calculating the doublelength quotient 
and doublelength square root routines. Singlelength multiplication is used to generate a 
doublelength approximation of the quotient or square root values. As with doublelength 
multiplication, exact addition is used to generate a doublelength floating-point result. 


; Calculates the doublelength quotient of (x,xx) and (y,yy) 
; the result being (z,zz) 
c=x/y; 
mult12(c, y, u, uu); 
cc = (x — u — uu + xx — Cc X yy)/y; 
Z=c+ ce; 
WZ=c-Z+ Ce; 


Figure 6. Doublelength Quotient 


; Calculate the doublelength square root of (x,xx), the 
; result being (z,zz) 
if (x>0) { 
c = sqrt (x); 
mult12 (c, c, u, uu); 
cc = (x — u — wu + xx) X 0.5/¢; 
z=c+cc; 
zZ=c—z+cc;} 
else { 
z= 27 = 0.3; 


Figure 7. Doublelength Square Root 


Error Analysis 


This section discusses and determines an upper bound for the error generated in 
forming a doublelength result. The value of the doublelength number (z,zz) is equal to 
z + 2z. Singlelength addition, subtraction, and multiplication results are always exact. 
In doublelength addition, any error introduced in the end result is generated by calculating 
the zz term. An upper bound error magnitude has been calculated in Reference [2] and 
is shown in Equation (6) as follows: 


JE+| <{\x-+xx| + lytyyl} x 22-2 = [z| x 22-2 ©) 


where t = 24 for this system. This gives an upper bound of |Z| x 2-46, or approximate- 
ly |Z| x 1.42 x 10-14. This translates to a theorical accuracy greater than 13 decimal 
places. Table 1 shows an example of doublelength addition using the exact addition 
algorithm previously described. The numbers in the left column represent TMS320C30 
hexadecimal notation for the floating-point results, and (z,zz) is the decimal equivalent 
of the doublelength output result. Appendix B shows a listing of C programs (exact) that 
convert from TMS320C30 hexadecimal notation to decimal notation. 


Table 1. Exact Singlelength Arithmetic Examples 


Singlelength Addition 


= 217FFFFFh 
OC7FFFFFh 

= 22000003h 17179876351.9995117 (Exact) 

= O97FFFF8h 17179876351.9995117 (DSP) 


FC7C8923h 
OA29A7E5h 

= OA29ABD8h 1357.37010409682989 (Exact) 
EFA46000h , 1357.37010409682989 (DSP) 


OF7FFFFFh 
= 21FFFFFFh 
= 30800000h = ~562949986975740 (Exact) 
= 18800002h —562949986975740 (DSP) 


FC7CB923h 

OA29A7E5h 
= 07277BF7h 167.484236862815123 (Exact) 
= EBA714FOh 167.484236862815123 (DSP) 


The doublelength product, quotient, and square-root algorithms all have a small 
relative error. The upperbound error magnitude for each is given in Equations (7) through 


(9). 


JEX| =(|x +xx| x ly +yy]) x 11 x 2-48 (7) 
|E+|=(|x+xx| + lyxyy|) x 21.1 x 2-48 (8) 
[EV | =sqrt(|x+xx|) x 12.7 x 2-48 @) 


Equation (7) establishes an upperbound of |Z| x 3.9 x 10-14, or approximately 
13 decimal digits of accuracy for doublelength multiplication. Similarly, an upperbound 
of |Z| x 7.5 x 10-14, or greater than 13 decimal digits for the doublelength square- 
root algorithm, is established. Table 2 shows examples for each algorithm discussed, along 
with the algorithm output and expected theorical output. 


Table 2. Exact Doublelength Arithmetic Examples 


Doublelength Multiplication 


x = 22000000h 
xx = O97FFFFEh 
21000001h 
O97FFFFEh 
z = 43000002h (z,zz) 

= 2A7FFFFCh 


" 


1.47573996570139475 x 1020 (Exact) 
1.47573996570139427 x 102° (DSP) 


= 22000003h 
xx = O97FFFF8h 
y = OA29ABD8h 
yy = EFA46000h 
2C29ABDDh (z,zz) 
= 13907DC2h 


N 
" 


23319450552284.2434 (Exact) 
23319450552284.1250 (DSP) 


43000002h 


xx = 2A7FFFFCh 
y = 2C29ABDDh 
yy = 13907DC2h 
z = 1641205Ah (z,2z) = 6328365.08044074177 (Exact) 


= FC24BE20h 6328365.08044075966 (DSP) 


22000000h 


xx = O97FFFFEh 
y = 21000001h 
yy = O97FFFFEh 
z = OO7FFFFDh (z,zz) = 1.99999964237223082 (Exact) 


D3400000h 


1.99999964237217398 (DSP) 


= 2C2BDDO0h 
= 3907DC2h 

= 61451A4h (z,zz) 
= FB39EF11h 


x 
x 
I 


4860114.04539400958 (Exact) 
4860114.04539400712 (DSP) 


" 


N 
l 


21000001h 
O97FFFFEh 
= 103504F5h 
= F7BCO784h 


= 92681.9110722252960 (Exact) 
92681.9110722253099 (DSP) 


N 
{ 

oa 
N 

7 
N 

basi 
{ 
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Note that the results were obtained using the programs shown in Appendix B. The 
C programs were created and compiled on a 80386-based microcomputer running under 
MS-DOS 3.3. 


How to Generate C-Callable Functions 


The source listings for the extended-precision arithmetic presented in Appendix A 
are optimized for execution speed and code size. These routines are designed to be used 
as macros in a user program environment or, with a few adjustments, as a C function. 


This section provides an overview of TMS320C30 C compiler calling conventions 
necessary to create functions that can be added to the C compiler library. You need a 
working knowledge of C language to understand the terminology in this section [4, 5, 6]. 


The C compiler uses the processor stack to pass arguments to functions, store local 
variables, and save temporary values. The C compiler uses two registers of the TMS320C30 
to manage the stack pointer (SP) and the frame pointer (AR3). 


When a C program calls a function, it must 


1. Push the arguments onto the stack, 
2. Call the function, and 
3. Pop the arguments off the stack, 


in that order. 
On the other hand, the called C function must perform the following tasks: 


. Set up a local frame by saving the old frame pointer on the stack. 
. Assign the new frame pointer to the current value of stack pointer. 
. Allocate the frame. 

. Save any dedicated registers that the function modifies. 

. Execute function code. 

. Store a scalar value in RO. 

. Deallocate the frame. 

. Lastly, restore the old frame pointer [4]. 


CANN PWN = 


The following code segment shows the singlelength addition routine modified to be 
in C-callable form. Note that registers R4 through R7 and AR4 through AR7 are dedicated 
registers used by the compiler. These registers must be saved as floating-point values. 


single set OFFh 
fp set ard 
x set r0 

y set r1 

z -set r2 
zz set r3 


w set r4 


x1 set r2 
y1 set ra- 
global __add12: 
.width 96 
.text 
__add12: 
push_ fp ; Save old fp 
pushf r4 
push r4 
Idi sp,fp ; Point to top of stack 
Idi *—fp(2),rO ; Load x into r0 
Idi *—fp(3),r1 —; Load y into r1 
absf = x, x1 
absf —y,y1 
cmpf = y1,x1 : [xl > ly| 
Idfit x,x1 
Idfit y.x 
dfit x1,y 
addf3 x,y,z ;Z=xt+y 
rnd z 
subf3  x,Z,w ; Form w =z — x 
rnd w 
subf3_w,y,zz ;z=y- ly - w) 
rnd zz 
pop r4 
popf r4 
pop fp ; Restore fp 
retsu 
.end 


Conclusion 


This report presented an implementation of extended-precision arithmetic routines 
for the TMS320C30 DSP. The programs presented include singlelength floating-point ad- 
dition, subtraction, and multiplication, which produce exact doublelength results. 
Doublelength floating-point addition, subtraction, multiplication, division, and square root 
were also presented. The doublelength floating-point routines all had a small relative er- 
ror that appeared in the correction term zz. However, it has been shown that the accuracy 
of the doublelength floating-point result is at least 13 decimal digits. Table 3 is a summary 
of information about the routines contained in Appendices A and B. Execution times shown 


in the table are given only for the routines in Appendix A. These times do not include 
the call and return if the routine is implemented as a called function. They also do not 
include any context saves and restores that may be required. 


Table 3. Summary Information 


Code Size | Execution 
Routine Appendix 
(Words) (Cycles) 


Singlelength Add _add12 
Doublelength Add __dbladd 
Singlelength Multiply —mult12 
Doublelength Multiply —mult2 

Doublelength Divide —div2 


Doublelength Square Root __sqrt2 
Change Two Single-Precision 

TMS320C30 Numbers to One 

Double-Precision Result C30DBL 
Change Two Double-Precision 

TMS320C30 Numbers to a 

Double-Precision Result C30DBL2 
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* 
* AUTHOR: AL Lovrich 2/21/89 
Texas Instruments, Ine. 


kntry Conditions: 
Upon entry (rQ,r 4) contains (x,xx) and 
(r2,1r3) contain (y,yy). 
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Registers Affected: 
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* Revision: Original 
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